February 16, 2021

Gene expression

I start the work with the data by finding the differentially expressed genes.

To do so, I perform the tests for comparison of the means of control and disease groups, starting with simple two sample for t-test. For each gene I also check the variances equality before comparing the groups’ means.

P-values from t-test: before and after correction

Gene expression - corrected test scheme

The number of differentiated genes proved to be really high, therefore I check whether the assumption on the normality of the distribution does not hinder the results by applying new test scheme:

  • Check the normality of distribution of both groups.
  • If it is normal, perform t-test, checking the equality of variances beforehand.
  • If it is not a normal distribution, perform Mann-Whitney test.
  • Correct obtained p-values with Benjamini & Hochberg method for multiple testing.

P-values after distribution consideration

Distribution effect

I compare the result of taking the distribution into consideration with the previous assumption.

Enrichment analysis

After getting gene differentiation, I proceed with enrichment analysis. I will start with ORA, then proceed into FCS methods.

ORA

##                                                 Title corrected_pvals
## 100                                     RNA transport    4.259285e-08
## 133                                        Cell cycle    4.259285e-08
## 305                               MicroRNAs in cancer    1.182302e-07
## 260                                 Alzheimer disease    1.960257e-06
## 265                                     Prion disease    1.960257e-06
## 295           Human T-cell leukemia virus 1 infection    1.960257e-06
## 300                                Pathways in cancer    2.529952e-06
## 88                                 Metabolic pathways    4.283633e-06
## 263                                Huntington disease    4.283633e-06
## 266 Pathways of neurodegeneration - multiple diseases    4.546480e-06
## 115                            Fanconi anemia pathway    6.612935e-06
## 170                                    Focal adhesion    6.612935e-06
## 304                           Proteoglycans in cancer    6.612935e-06
## 262                     Amyotrophic lateral sclerosis    7.685330e-06
## 101                         mRNA surveillance pathway    2.554206e-05

CERNO

##                                       Title corrected_pvals
## 88                       Metabolic pathways    2.368986e-08
## 133                              Cell cycle    2.368986e-08
## 300                      Pathways in cancer    2.368986e-08
## 295 Human T-cell leukemia virus 1 infection    4.941884e-08
## 170                          Focal adhesion    1.231707e-07
## 305                     MicroRNAs in cancer    5.131701e-07
## 159      Vascular smooth muscle contraction    5.800451e-07
## 116                  MAPK signaling pathway    2.139096e-06
## 119                  Rap1 signaling pathway    2.139096e-06
## 125             Chemokine signaling pathway    4.895844e-06
## 304                 Proteoglycans in cancer    4.895844e-06
## 148              PI3K-Akt signaling pathway    5.527751e-06
## 144                             Endocytosis    5.931361e-06
## 177     Complement and coagulation cascades    5.931361e-06
## 121              cGMP-PKG signaling pathway    7.034663e-06

Z-transform

##                                                 Title corrected_pvals
## 88                                 Metabolic pathways    9.001854e-28
## 300                                Pathways in cancer    1.231452e-22
## 266 Pathways of neurodegeneration - multiple diseases    9.693740e-16
## 295           Human T-cell leukemia virus 1 infection    1.234043e-15
## 148                        PI3K-Akt signaling pathway    2.047843e-14
## 116                            MAPK signaling pathway    6.369432e-14
## 133                                        Cell cycle    6.369432e-14
## 170                                    Focal adhesion    6.369432e-14
## 260                                 Alzheimer disease    2.115722e-13
## 305                               MicroRNAs in cancer    3.002434e-13
## 304                           Proteoglycans in cancer    4.702153e-13
## 265                                     Prion disease    3.590622e-12
## 263                                Huntington disease    4.907427e-12
## 119                            Rap1 signaling pathway    6.906820e-12
## 294                    Human papillomavirus infection    5.373930e-11

Results comparison

I can see that the Z-transform p-values start much smaller than for other tests. I also check the number of enriched gene sets for each method. I examine how many gene sets occur in all three methods’ results.

##                    ORA CERNO Z_transform
## enriched gene sets 114   200         263

Joint gene sets

## Number of joint enriched gene sets:  111
##                                              Title   ora cerno     z
##                                 Metabolic pathways 4e-06 2e-08 9e-28
##                                 Pathways in cancer 3e-06 2e-08 1e-22
##  Pathways of neurodegeneration - multiple diseases 5e-06 2e-05 1e-15
##            Human T-cell leukemia virus 1 infection 2e-06 5e-08 1e-15
##                         PI3K-Akt signaling pathway 2e-03 6e-06 2e-14
##                                         Cell cycle 4e-08 2e-08 6e-14
##                                     Focal adhesion 7e-06 1e-07 6e-14
##                             MAPK signaling pathway 1e-03 2e-06 6e-14
##                                  Alzheimer disease 2e-06 1e-04 2e-13
##                                MicroRNAs in cancer 1e-07 5e-07 3e-13

GSEA implementation

Signal to noise absolute - p-values

We can see the matlab output is definitely strange.

Signal to noise absolute - ES

## P-value:  4.997284e-42
## Correlation coefficient:  -0.6518573

Signal to noise - p-values

Signal to noise - ES

## P-value:  0.261582
## Correlation coefficient:  0.06141717

LFC absolute - p-values

LFC absolute - ES

## P-value:  2.1936e-26
## Correlation coefficient:  -0.5360192

LFC - p-values

LFC - ES

## P-value:  0.007659464
## Correlation coefficient:  0.1452507

P-values correlation

Combining p-values

##                                            Title pval_combined
## hsa04022              cGMP-PKG signaling pathway  0.000000e+00
## hsa05206                     MicroRNAs in cancer  0.000000e+00
## hsa05200                      Pathways in cancer  3.037172e-29
## hsa04110                              Cell cycle  7.314069e-29
## hsa05166 Human T-cell leukemia virus 1 infection  7.656778e-23
## hsa01100                      Metabolic pathways  9.091406e-23
## hsa04510                          Focal adhesion  1.620879e-22
## hsa04270      Vascular smooth muscle contraction  1.888406e-20
## hsa05205                 Proteoglycans in cancer  4.510265e-20
## hsa04015                  Rap1 signaling pathway  1.295541e-19
## hsa04010                  MAPK signaling pathway  5.947362e-19
## hsa04360                           Axon guidance  7.986816e-19
## hsa04530                          Tight junction  2.354755e-18
## hsa03013                           RNA transport  2.641794e-18
## hsa04151              PI3K-Akt signaling pathway  9.328139e-18